Overview

Dataset statistics

Number of variables16
Number of observations222500
Missing cells234318
Missing cells (%)6.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.8 MiB
Average record size in memory65.0 B

Variable types

NUM8
CAT6
BOOL2

Reproduction

Analysis started2020-03-26 17:53:19.113986
Analysis finished2020-03-26 19:22:23.734206
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Census_OEMNameIdentifier has 2243 (1.0%) missing values Missing
Census_OEMModelIdentifier has 2401 (1.1%) missing values Missing
Census_ProcessorClass has 221498 (99.5%) missing values Missing

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count222500
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean223242.75157752808
Minimum3
Maximum446244
Zeros0
Zeros (%)0.0%
Memory size1.7 MiB

Quantile statistics

Minimum3
5-th percentile22424.95
Q1111764.5
median223196.5
Q3334939.25
95-th percentile423972.1
Maximum446244
Range446241
Interquartile range (IQR)223174.75

Descriptive statistics

Standard deviation128837.2552
Coefficient of variation (CV)0.577117305
Kurtosis-1.20097224
Mean223242.7516
Median Absolute Deviation (MAD)111586.7464
Skewness0.0003429752968
Sum4.967151223e+10
Variance1.659903832e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3.00000e+00 4.46244e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4094 1 < 0.1%
 
204086 1 < 0.1%
 
62795 1 < 0.1%
 
58697 1 < 0.1%
 
60744 1 < 0.1%
 
36164 1 < 0.1%
 
46403 1 < 0.1%
 
48450 1 < 0.1%
 
42305 1 < 0.1%
 
44352 1 < 0.1%
 
Other values (222490) 222490 > 99.9%
 
ValueCountFrequency (%) 
3 1 < 0.1%
 
5 1 < 0.1%
 
6 1 < 0.1%
 
9 1 < 0.1%
 
10 1 < 0.1%
 
ValueCountFrequency (%) 
446244 1 < 0.1%
 
446241 1 < 0.1%
 
446235 1 < 0.1%
 
446231 1 < 0.1%
 
446224 1 < 0.1%
 
Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size217.8 KiB
Notebook
144007
Desktop
50461
Convertible
 
10173
AllInOne
 
7410
Detachable
 
5698
Other values (7)
 
4751
ValueCountFrequency (%) 
Notebook 144007 64.7%
 
Desktop 50461 22.7%
 
Convertible 10173 4.6%
 
AllInOne 7410 3.3%
 
Detachable 5698 2.6%
 
PCOther 3278 1.5%
 
LargeTablet 1012 0.5%
 
SmallTablet 264 0.1%
 
SmallServer 131 0.1%
 
MediumServer 47 < 0.1%
 
Other values (2) 19 < 0.1%
 

Length

Max length12
Mean length7.966930337
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 19 63.3%
 
Uppercase_Letter 11 36.7%
 
ValueCountFrequency (%) 
Latin 30 100.0%
 
ValueCountFrequency (%) 
ASCII 30 100.0%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size217.5 KiB
Windows.Desktop
222258
Windows.Server
 
241
Windows
 
1
ValueCountFrequency (%) 
Windows.Desktop 222258 99.9%
 
Windows.Server 241 0.1%
 
Windows 1 < 0.1%
 

Length

Max length15
Mean length14.9988809
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 12 75.0%
 
Uppercase_Letter 3 18.8%
 
Other_Punctuation 1 6.2%
 
ValueCountFrequency (%) 
Latin 15 93.8%
 
Common 1 6.2%
 
ValueCountFrequency (%) 
ASCII 16 100.0%
 

Census_OEMNameIdentifier
Real number (ℝ≥0)

MISSING
Distinct count1009
Unique (%)0.5%
Missing2243
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean2197.5884
Minimum74.0
Maximum6143.0
Zeros0
Zeros (%)0.0%
Memory size869.3 KiB

Quantile statistics

Minimum74
5-th percentile525
Q11443
median2102
Q32668
95-th percentile4730
Maximum6143
Range6069
Interquartile range (IQR)1225

Descriptive statistics

Standard deviation1298.672241
Coefficient of variation (CV)0.5909533501
Kurtosis-0.3988922536
Mean2197.588379
Median Absolute Deviation (MAD)996.3306274
Skewness0.5477777123
Sum484034240
Variance1686549.625
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2668 32453 14.6%
 
2102 26892 12.1%
 
1443 24130 10.8%
 
2206 22459 10.1%
 
585 22198 10.0%
 
525 21947 9.9%
 
4589 8562 3.8%
 
1980 7915 3.6%
 
4730 7243 3.3%
 
4142 4624 2.1%
 
Other values (999) 41834 18.8%
 
ValueCountFrequency (%) 
74 3 < 0.1%
 
82 2 < 0.1%
 
86 5 < 0.1%
 
165 3 < 0.1%
 
176 16 < 0.1%
 
ValueCountFrequency (%) 
6143 1 < 0.1%
 
6142 1 < 0.1%
 
6095 1 < 0.1%
 
6086 2 < 0.1%
 
6062 12 < 0.1%
 

Census_OEMModelIdentifier
Real number (ℝ≥0)

MISSING
Distinct count24613
Unique (%)11.2%
Missing2401
Missing (%)1.1%
Infinite0
Infinite (%)0.0%
Mean238712.25
Minimum23.0
Maximum345496.0
Zeros0
Zeros (%)0.0%
Memory size869.3 KiB

Quantile statistics

Minimum23
5-th percentile83257
Q1189586
median245971
Q3303366
95-th percentile331210.0938
Maximum345496
Range345473
Interquartile range (IQR)113780

Descriptive statistics

Standard deviation72003.53125
Coefficient of variation (CV)0.3016331494
Kurtosis1.158289671
Mean238712.25
Median Absolute Deviation (MAD)54601.05078
Skewness-0.9952222109
Sum5.254032589e+10
Variance5184508416
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
313586 8448 3.8%
 
242491 6770 3.0%
 
317701 3736 1.7%
 
317708 3087 1.4%
 
188345 2031 0.9%
 
245824 2012 0.9%
 
228975 2004 0.9%
 
241876 1793 0.8%
 
244755 1349 0.6%
 
248045 1047 0.5%
 
Other values (24603) 187822 84.4%
 
(Missing) 2401 1.1%
 
ValueCountFrequency (%) 
23 2 < 0.1%
 
150 4 < 0.1%
 
156 6 < 0.1%
 
167 1 < 0.1%
 
171 1 < 0.1%
 
ValueCountFrequency (%) 
345496 1 < 0.1%
 
345490 1 < 0.1%
 
345485 1 < 0.1%
 
345433 1 < 0.1%
 
345410 1 < 0.1%
 

Census_ProcessorCoreCount
Real number (ℝ≥0)

Distinct count20
Unique (%)< 0.1%
Missing1094
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Meannan
Minimum1.0
Maximum64.0
Zeros0
Zeros (%)0.0%
Memory size434.7 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median4
Q34
95-th percentile8
Maximum64
Range63
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0
Coefficient of variation (CV)nan
Kurtosisnan
Meannan
Median Absolute Deviation (MAD)nan
Skewnessnan
Suminf
Variance0
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4 137649 61.9%
 
2 52982 23.8%
 
8 24015 10.8%
 
12 2741 1.2%
 
6 1968 0.9%
 
1 1032 0.5%
 
16 521 0.2%
 
3 308 0.1%
 
32 58 < 0.1%
 
24 50 < 0.1%
 
Other values (10) 82 < 0.1%
 
(Missing) 1094 0.5%
 
ValueCountFrequency (%) 
1 1032 0.5%
 
2 52982 23.8%
 
3 308 0.1%
 
4 137649 61.9%
 
5 4 < 0.1%
 
ValueCountFrequency (%) 
64 3 < 0.1%
 
56 4 < 0.1%
 
48 4 < 0.1%
 
40 12 < 0.1%
 
36 6 < 0.1%
 
Distinct count3
Unique (%)< 0.1%
Missing1095
Missing (%)0.5%
Memory size434.7 KiB
5
196107
1
 
25296
3
 
2
ValueCountFrequency (%) 
5 196107 88.1%
 
1 25296 11.4%
 
3 2 < 0.1%
 
(Missing) 1095 0.5%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Decimal_Number 4 57.1%
 
Lowercase_Letter 2 28.6%
 
Other_Punctuation 1 14.3%
 
ValueCountFrequency (%) 
Common 5 71.4%
 
Latin 2 28.6%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

Census_ProcessorModelIdentifier
Real number (ℝ≥0)

Distinct count1887
Unique (%)0.9%
Missing1095
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean2389.8523
Minimum19.0
Maximum4472.0
Zeros0
Zeros (%)0.0%
Memory size869.3 KiB

Quantile statistics

Minimum19
5-th percentile311
Q12056
median2523
Q32883
95-th percentile3426
Maximum4472
Range4453
Interquartile range (IQR)827

Descriptive statistics

Standard deviation828.9053955
Coefficient of variation (CV)0.3468437791
Kurtosis1.441684008
Mean2389.852295
Median Absolute Deviation (MAD)580.838562
Skewness-1.051075339
Sum529125248
Variance687084.125
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2697 7829 3.5%
 
1998 6575 3.0%
 
2660 5332 2.4%
 
2373 4639 2.1%
 
2382 4480 2.0%
 
2640 4245 1.9%
 
1992 4101 1.8%
 
2737 3215 1.4%
 
3063 3208 1.4%
 
1985 3184 1.4%
 
Other values (1877) 174597 78.5%
 
ValueCountFrequency (%) 
19 34 < 0.1%
 
23 4 < 0.1%
 
25 7 < 0.1%
 
27 3 < 0.1%
 
29 94 < 0.1%
 
ValueCountFrequency (%) 
4472 1 < 0.1%
 
4469 1 < 0.1%
 
4468 1 < 0.1%
 
4446 1 < 0.1%
 
4437 1 < 0.1%
 

Census_ProcessorClass
Categorical

MISSING
Distinct count3
Unique (%)0.3%
Missing221498
Missing (%)99.5%
Memory size217.5 KiB
mid
566
low
228
high
208
ValueCountFrequency (%) 
mid 566 0.3%
 
low 228 0.1%
 
high 208 0.1%
 
(Missing) 221498 99.5%
 

Length

Max length4
Mean length3.000934831
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 10 100.0%
 
ValueCountFrequency (%) 
Latin 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

Census_PrimaryDiskTotalCapacity
Real number (ℝ≥0)

Distinct count605
Unique (%)0.3%
Missing1379
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean531557.4154377015
Minimum0.0
Maximum11445248.0
Zeros1
Zeros (%)< 0.1%
Memory size1.7 MiB

Quantile statistics

Minimum0
5-th percentile59640
Q1244198
median476940
Q3953869
95-th percentile953869
Maximum11445248
Range11445248
Interquartile range (IQR)709671

Descriptive statistics

Standard deviation357291.7955
Coefficient of variation (CV)0.6721603069
Kurtosis11.87550453
Mean531557.4154
Median Absolute Deviation (MAD)279611.7992
Skewness1.401335489
Sum1.175385073e+11
Variance1.276574271e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
476940 70823 31.8%
 
953869 58447 26.3%
 
122104 12382 5.6%
 
244198 11876 5.3%
 
305245 10518 4.7%
 
238475 7541 3.4%
 
114473 7054 3.2%
 
29820 6368 2.9%
 
715404 6196 2.8%
 
228936 4720 2.1%
 
Other values (595) 25196 11.3%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
10227 1 < 0.1%
 
14800 24 < 0.1%
 
14910 2 < 0.1%
 
14912 19 < 0.1%
 
ValueCountFrequency (%) 
11445248 1 < 0.1%
 
11444736 1 < 0.1%
 
5723166 2 < 0.1%
 
5723091 1 < 0.1%
 
5376000 1 < 0.1%
 
Distinct count4
Unique (%)< 0.1%
Missing289
Missing (%)0.1%
Memory size217.6 KiB
HDD
146901
SSD
61254
UNKNOWN
 
8014
Unspecified
 
6042
ValueCountFrequency (%) 
HDD 146901 66.0%
 
SSD 61254 27.5%
 
UNKNOWN 8014 3.6%
 
Unspecified 6042 2.7%
 
(Missing) 289 0.1%
 

Length

Max length11
Mean length3.36131236
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 9 52.9%
 
Uppercase_Letter 8 47.1%
 
ValueCountFrequency (%) 
Latin 17 100.0%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 
Distinct count83754
Unique (%)37.9%
Missing1378
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean382266.9740686137
Minimum0.0
Maximum5375384.0
Zeros2
Zeros (%)< 0.1%
Memory size1.7 MiB

Quantile statistics

Minimum0
5-th percentile40960
Q1121027
median250203
Q3476164
95-th percentile952728
Maximum5375384
Range5375384
Interquartile range (IQR)355137

Descriptive statistics

Standard deviation323012.6715
Coefficient of variation (CV)0.8449923571
Kurtosis4.306829133
Mean382266.9741
Median Absolute Deviation (MAD)252225.7663
Skewness1.557900475
Sum8.452763784e+10
Variance1.043371859e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
926992 1263 0.6%
 
953253 1119 0.5%
 
476389 1115 0.5%
 
28542 1106 0.5%
 
476324 1042 0.5%
 
952728 961 0.4%
 
102400 945 0.4%
 
475799 864 0.4%
 
476323 841 0.4%
 
952792 835 0.4%
 
Other values (83744) 211031 94.8%
 
(Missing) 1378 0.6%
 
ValueCountFrequency (%) 
0 2 < 0.1%
 
7667 1 < 0.1%
 
9676 1 < 0.1%
 
9782 1 < 0.1%
 
10000 1 < 0.1%
 
ValueCountFrequency (%) 
5375384 1 < 0.1%
 
3906691 1 < 0.1%
 
3815430 1 < 0.1%
 
3814881 1 < 0.1%
 
3814880 1 < 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
0
203997
1
 
18503
ValueCountFrequency (%) 
0 203997 91.7%
 
1 18503 8.3%
 

Census_TotalPhysicalRAM
Real number (ℝ≥0)

Distinct count226
Unique (%)0.1%
Missing1831
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean6422.0474
Minimum768.0
Maximum262144.0
Zeros0
Zeros (%)0.0%
Memory size869.3 KiB

Quantile statistics

Minimum768
5-th percentile2048
Q14096
median4096
Q38192
95-th percentile16384
Maximum262144
Range261376
Interquartile range (IQR)4096

Descriptive statistics

Standard deviation5048.120117
Coefficient of variation (CV)0.7860608697
Kurtosis148.8574677
Mean6422.047363
Median Absolute Deviation (MAD)3208.66626
Skewness6.831326485
Sum1417146752
Variance25483518
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4096 101747 45.7%
 
8192 58972 26.5%
 
2048 22297 10.0%
 
16384 15231 6.8%
 
6144 10444 4.7%
 
12288 4489 2.0%
 
3072 3241 1.5%
 
32768 1632 0.7%
 
1024 809 0.4%
 
24576 359 0.2%
 
Other values (216) 1448 0.7%
 
(Missing) 1831 0.8%
 
ValueCountFrequency (%) 
768 1 < 0.1%
 
991 1 < 0.1%
 
1013 1 < 0.1%
 
1014 1 < 0.1%
 
1015 2 < 0.1%
 
ValueCountFrequency (%) 
262144 2 < 0.1%
 
196608 2 < 0.1%
 
131072 28 < 0.1%
 
98304 3 < 0.1%
 
90112 1 < 0.1%
 
Distinct count35
Unique (%)< 0.1%
Missing15
Missing (%)< 0.1%
Memory size218.9 KiB
Notebook
131600
Desktop
48638
Laptop
 
16821
Portable
 
8758
AllinOne
 
5185
Other values (30)
 
11483
ValueCountFrequency (%) 
Notebook 131600 59.1%
 
Desktop 48638 21.9%
 
Laptop 16821 7.6%
 
Portable 8758 3.9%
 
AllinOne 5185 2.3%
 
MiniTower 2217 1.0%
 
Convertible 2030 0.9%
 
UNKNOWN 1530 0.7%
 
LowProfileDesktop 1278 0.6%
 
Other 985 0.4%
 
Other values (25) 3443 1.5%
 

Length

Max length19
Mean length7.717640449
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 21 50.0%
 
Uppercase_Letter 17 40.5%
 
Decimal_Number 4 9.5%
 
ValueCountFrequency (%) 
Latin 38 90.5%
 
Common 4 9.5%
 
ValueCountFrequency (%) 
ASCII 42 100.0%
 

HasDetections
Boolean

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
1
222500
ValueCountFrequency (%) 
1 222500 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexCensus_MDC2FormFactorCensus_DeviceFamilyCensus_OEMNameIdentifierCensus_OEMModelIdentifierCensus_ProcessorCoreCountCensus_ProcessorManufacturerIdentifierCensus_ProcessorModelIdentifierCensus_ProcessorClassCensus_PrimaryDiskTotalCapacityCensus_PrimaryDiskTypeNameCensus_SystemVolumeTotalCapacityCensus_HasOpticalDiskDriveCensus_TotalPhysicalRAMCensus_ChassisTypeNameHasDetections
03NotebookWindows.Desktop1443.0256685.04.05.03063.0NaN953869.0HDD149500.008192.0Notebook1
15NotebookWindows.Desktop4894.0272191.02.05.02097.0NaN953869.0HDD953160.004096.0Notebook1
26DesktopWindows.Desktop1443.0275893.08.05.02962.0NaN244198.0SSD243645.0016384.0MiniTower1
39NotebookWindows.Desktop585.0189551.02.05.02097.0NaN476940.0HDD475798.014096.0Notebook1
410DesktopWindows.Desktop924.0193916.02.05.03195.0NaN476940.0HDD198572.002048.0Desktop1
511NotebookWindows.Desktop525.0331192.04.05.02321.0NaN476940.0HDD100000.002048.0Notebook1
612NotebookWindows.Desktop2206.0229872.02.05.01983.0NaN476940.0HDD461312.002048.0Notebook1
715AllInOneWindows.Desktop3035.0263666.04.05.02407.0NaN228936.0SSD228320.004096.0Desktop1
817DesktopWindows.Desktop3035.0263637.08.05.02966.0NaN1907729.0HDD1906339.0016384.0Desktop1
918NotebookWindows.Desktop4730.0310837.04.05.02296.0NaN715404.0HDD704278.006144.0Notebook1

Last rows

df_indexCensus_MDC2FormFactorCensus_DeviceFamilyCensus_OEMNameIdentifierCensus_OEMModelIdentifierCensus_ProcessorCoreCountCensus_ProcessorManufacturerIdentifierCensus_ProcessorModelIdentifierCensus_ProcessorClassCensus_PrimaryDiskTotalCapacityCensus_PrimaryDiskTypeNameCensus_SystemVolumeTotalCapacityCensus_HasOpticalDiskDriveCensus_TotalPhysicalRAMCensus_ChassisTypeNameHasDetections
222490446217NotebookWindows.Desktop2102.0229920.02.05.01998.0NaN476940.0HDD461953.004096.0Notebook1
222491446218NotebookWindows.Desktop525.0331178.08.05.02737.0NaN953869.0HDD952792.008192.0Notebook1
222492446219NotebookWindows.Desktop585.0189211.04.05.03499.0NaN122104.0SSD121488.004096.0Notebook1
222493446221NotebookWindows.Desktop585.0189592.04.05.03063.0NaN122104.0SSD122102.008192.0Notebook1
222494446223NotebookWindows.Desktop2206.0244755.04.01.0187.0NaN476940.0HDD455604.004096.0Notebook1
222495446224NotebookWindows.Desktop525.0331054.02.05.02097.0NaN476940.0HDD475863.004096.0Notebook1
222496446231NotebookWindows.Desktop525.0331260.02.05.02012.0NaN476940.0HDD454536.014096.0Notebook1
222497446235DesktopWindows.Desktop3150.0328141.08.05.02891.0NaN102399.0SSD101898.0012288.0Desktop1
222498446241AllInOneWindows.Desktop666.0340349.04.05.02510.0NaN953869.0HDD204800.0016384.0AllinOne1
222499446244DesktopWindows.Desktop1443.0275846.02.05.03212.0NaN238418.0UNKNOWN99147.003072.0MiniTower1